Multi-campaign ship and aircraft observations of marine cloud condensation nuclei and droplet concentrations

In-situ marine cloud droplet number concentrations (CDNCs), cloud condensation nuclei (CCN), and CCN proxies, based on particle sizes and optical properties, are accumulated from seven field campaigns: ACTIVATE; NAAMES; CAMP2EX; ORACLES; SOCRATES; MARCUS; and CAPRICORN2. Each campaign involves aircraft measurements, ship-based measurements, or both. Measurements collected over the North and Central Atlantic, Indo-Pacific, and Southern Oceans, represent a range of clean to polluted conditions in various climate regimes. With the extensive range of environmental conditions sampled, this data collection is ideal for testing satellite remote detection methods of CDNC and CCN in marine environments. Remote measurement methods are vital to expanding the available data in these difficult-to-reach regions of the Earth and improving our understanding of aerosol-cloud interactions. The data collection includes particle composition and continental tracers to identify potential contributing CCN sources. Several of these campaigns include High Spectral Resolution Lidar (HSRL) and polarimetric imaging measurements and retrievals that will be the basis for the next generation of space-based remote sensors and, thus, can be utilized as satellite surrogates.


North atlantic aerosols and marine ecosystems study (NAAMES). NAAMES was conducted over
four years through four deployment campaigns that included ship-based aerosol measurements on the R/V Atlantis in the North Atlantic 38 . Three of these deployments were complimented by C130 aircraft measurements of in-situ and remote aerosol and cloud measurements. The study focuses on improving the understanding of the ocean ecosystem-aerosol-cloud system of the western subarctic Atlantic through (1) characterizing plankton ecosystem properties during primary phases of the annual cycle and their dependence of environmental forcing, (2) determining how these phases interact to recreate each year the conditions for an annual plankton bloom, and (3) resolve how remote marine aerosols and boundary layer clouds are influenced by plankton ecosystems 38 Table 2. Instruments used to measure relevant parameters for each airborne campaign and their approximate diameter range and temporal resolution. All airborne campaigns consist of measurements of aerosol size distributions, CCN, CDNC, HSRL and a Research Scanning Polarimeter (RSP), with the exceptions that SOCRATES and ORACLES 1-2 do not have RSP measurements. * All aerosol particles are dried prior to measurement, except for PCASP measurements. Note, diameter measurements and limits are expressed as either aerodynamic, mobility or optical diameter. ** Instruments were set to various diameter ranges, but lognormal fits are used to account for black carbon mass outside the set range. *** The AMS transmission efficiency varies by particle size and slightly from instrument to instrument. Diameters are aerodynamic diameters.
www.nature.com/scientificdata www.nature.com/scientificdata/ concentrations, CCN, and cloud properties, (2) to generate a unique dataset for international model intercomparison and process-based studies, (3) to evaluate current remote sensing retrievals and prototypes for future satellite missions, and (4) develop improved satellite-based CDNC and CCN proxy retrievals 39 . The campaigns were conducted in February-March and August-September in 2020, January-March and May-June in 2021, and January-March and May-June in 2022.

Observations of aerosol above clouds and their interactions (ORACLES). ORACLES consisted of
airborne (NASA P-3 and ER-2 Aircraft) in-situ and remote measurements over the south Atlantic, west of Africa 40 . Three field campaigns (2016-2018) were conducted during the southern African biomass burning season (August to October). The goals of the ORACLES study were (1) to determine the impact of African-produced biomass burning aerosol on cloud properties and the radiation balance over the south Atlantic and (2) to acquire process understanding of aerosol-cloud-radiation interactions and resulting cloud adjustments that can be applied to models.
Clouds, aerosol, monsoon processes-philippines experiment (CAMP 2 Ex). CAMP 2 Ex is an airborne mission conducted on the NASA P-3 and SPEC Lear-35 based in Clark, Philippines from 25 August-5 October 2019 41 . The campaign is designed to characterize the role of anthropogenic and natural aerosol properties in modulating the frequency and amount of warm and mixed-phase precipitation upstream of the North Subtropical Western Pacific's Southwest Monsoon trough. Notable in CAMP 2 Ex was its wide variety of aerosol types, including a) biomass burning from Indonesian peatlands; b) the Metro Manila super plume; c) long range industrial pollution from the Maritime Continent through Southeast Asia and China, and finally pristine subtropical Pacific air masses. CAMP 2 Ex also cooperated significantly with the Office of Naval Research Propagation of Intraseasonal Oscillations (PISTON) mission, that provided the R/V Sally Ride with a host of lidars and radars 41 . Southern ocean clouds, radiation, aerosol transport experimental study (SOCRATES). During SOCRATES, the NSF-NCAR GV aircraft sampled aerosol and clouds in and above the marine boundary layer along primarily north-south transects in January-February 2018, targeting areas of cyclones where models struggle to produce supercooled liquid water. In addition, the main goals of SOCRATES involved characterizing the structure of the marine boundary layer and free troposphere over the Southern Ocean, including the vertical and latitudinal distribution of aerosol and cloud properties 42-45 . Clouds aerosols precipitation radiation and atmospheric composition over the southerrn ocean (CAPRICORN2). CAPRICORN2 consisted of ship-based measurements on the R/V Investigator.
Measurements were conducted south of Tasmania, Australia (Fig. 1) and lead by the Australian Bureau of Meteorology. The objectives were to (1) characterize cloud, aerosol, and precipitation properties, boundary layer structure, biological production and cycling of dimethyl sulfide, atmospheric composition, and surface energy budget and latitudinal variations, (2) evaluate and improve satellite products (focusing on the NASA A-train and Global Precipitation Measurements mission cloud, precipitation, and surface heat flux products), and (3) evaluate and improve the representation of these properties in the Australian Community Climate and Earth-System Simulator model 42 . Measurements were conducted in January-February 2018, overlapping with the SOCRATES campaign.  Table 3. Instruments used to measure relevant parameters for each ship-based campaign and their approximate diameter range and temporal resolution. All campaigns consist of measurements of aerosol size distributions and CCN. * All aerosol particles are dried prior to measurement. Note, diameter measurements and limits are expressed as either aerodynamic, mobility or optical diameter. ** Instruments were set to various diameter ranges, but log-normal fits are used to account for black carbon mass outside the set range. *** The AMS and ACSM transmission efficiency varies by particle size and slightly from instrument to instrument. Diameters are aerodynamic diameters.
www.nature.com/scientificdata www.nature.com/scientificdata/ Measurements of aerosols, radiation and clouds over the southern ocean (MARCUS). MARCUS is also a ship-based measurement study conducted south of Tasmania, Australia. Measurements were collected on a United States Department of Energy Atmospheric Radiation Measurement Program Mobile Facility deployed on the RSV Aurora Australis as it made four resupply trips to three Australian Antarctic bases (Mawson, Davis, and Casey) from October 2017 to March 2018. The objectives of MARCUS were to (1) understand synoptically varying vertical structure of Southern Ocean boundary layer clouds and aerosols, (2) quantify the sources and sinks of CCN and ice nuclei particles (INPs), including the role of local biogenic sources over spring, summer and fall, (3) quantify the mechanisms controlling supercooled liquid and mixed-phased clouds, and (4) advance retrievals of clouds, precipitation, and aerosol over the Southern Ocean 42 . The MARCUS campaign overlapped with both SOCRATES and CAPRICORN2. Tables 2, 3 summarize the measurements made on each airborne and ship campaign, respectively. The available instrumentation listed in these tables is not exhaustive. This manuscript contains only CDNC, CCN, CCN proxies, and measurements necessary to identify particle physical and chemical properties and non-marine contributions to particle concentrations. Other measurements not discussed here are accessible through the original campaign archives referenced in the Data Records section. This section provides specific information for the listed instrument. All concentrations are reported with respect to standard temperature and pressure (273.15 K, 1013 hPa) unless indicated otherwise in the section below. Also, all particle measurements were collected during subsaturated conditions and further dried unless indicated otherwise. Tables 2, 3 identify which campaigns the instruments were used. Only in cases when there are different instrument models used for the same measurement, or if there are campaign specific details provided, will the campaign be discussed (Ex: Measurements collected from four different Condensation Particle Counters (CPC) models are discussed, so the text will indicate which model was used on which campaign). The uncertainty or relative statistical counting error of instruments counting particles is calculated as sqrt(N)/N, assuming Poisson statistics, where N is the measured number of particles. Measurements are made at 1 Hz frequency or averaged to provide 1 Hz frequency unless otherwise stated. Particle and gas composition. Remote particle composition measurements are limited and highly uncertain. Particle composition is necessary to include when validating remote CCN measurements since particle composition affects its ability to act as CCN, and therefore is likely a source of uncertainty. Furthermore, errors in remote measurements of marine CCN concentrations may vary with continental and pollution influences, so categorizing measurements based on the particle sources is essential to identify sources of uncertainty.

Particle concentration and size distributions.
Submicron particle composition is analyzed with a high-resolution time-of-flight aerosol mass spectrometer (AMS, Aerodyne Research Inc., Billerica, MA) 49 that typically measures bulk non-refractory inorganic (sulfate, ammonium, nitrate, chloride) and organic components. Some AMS modes can measure single-particle composition, but these measurements are not widely available for airborne campaigns. The approximate size (2023) 10:471 | https://doi.org/10.1038/s41597-023-02372-z www.nature.com/scientificdata www.nature.com/scientificdata/ range of measured particles is 0.06-0.60 µm vacuum aerodynamic diameter and can vary slightly by instrument. No correction has been applied to account for mass outside this diameter range. The uncertainty is about 50% and 25% for airborne and ship measurements, respectively, based on sample scan times, processing assumptions, and instrument limitations. AMS ship-borne measurements typically have longer sample scan times than airborne measurements due to the relatively slow change in air mass due to the slow progress of the ship compared to aircraft. The longer sampling time for ship measurements reduces the uncertainty relative to the aircraft measurements 50 . Although the AMS collection efficiency varies by instrument and with atmospheric conditions and is sometimes not corrected for or assumed in the published datasets, the non-refractory particle composition ratio is still a relevant and valuable quantity for evaluating bulk particle chemical properties. On aircraft, during NAAMES, ACTIVATE, and CAMP 2 EX, the AMS measures particle mass concentrations in V-mode (high sensitivity) at 30 second intervals.
During the NAAMES campaigns, the AMS on the R/V Atlantis had much longer sample cycle times. Each sample cycle lasts a total of 5 minutes. During each cycle, the ambient aerosol is sampled in V-mode (high sensitivity) for 2 min, W-mode (high resolution) for 1 min, and event-trigger-mode (single particle composition) for 2 min. Supermicron particles were removed prior to sampling with a sharp-cut cyclone (SCC 2.229, BGI Inc. US).
Finally, during the CAPRICORN2 campaign on the R/V Investigator, an Aerosol Chemical Speciation Monitor (ACSM) measured particle mass compositions similar to the AMS. The ACSM measures the same components as the AMS plus methanesulfonic acid (MSA). The upper size cut is approximately 1.0 µm diameter, and the transmission efficiency drops below 0.1 µm in aerodynamic diameter.
Pollutants. Refractory black carbon particle mass is measured with a Single Particle Soot Photometer (SP2, DMT, Boulder, CO) to identify anthropogenic pollution. The SP2 derives black carbon mass by measuring laser-induced particle incandescence for particles between 0.07 and 0.50 µm in diameter, assuming a black carbon density of 1.8 g cm −3 . The SP2 sampling rate on the NAAMES ship and all aircraft campaigns are 60 s and 10 s, respectively. The SP2 instruments used in these studies are calibrated to similar but slightly different diameter ranges, reported in the originally published campaign datafiles. Black carbon mass outside the calibrated range is still accounted for using a log-normal fit to the distributions. For SP2 measurements collected in-flight, where sampling rates are increased due to the considerable distance covered by the aircraft, the log-normal fit is applied to the flight-averaged data to produce a correction coefficient that is applied to the entire flight. The correction coefficient typically increases the black carbon mass by less than 10%. During the NAAMES and ORACLES aircraft campaigns and MARCUS ship campaigns, the carbon monoxide mixing ratio is measured with a CO/CO 2 gas analyzer (Los Gatos Research, San Jose, CA). During the ACTIVATE and CAMP 2 EX campaigns, the carbon monoxide mixing ratio is measured using a G2401-m in-flight Gas Concentration Analyzer (Picarro Inc., Santa Clara, CA). On both the NAAMES and CAPRICORN II ship campaigns, radon activity concentration was measured in mBq m −3 as a tracer for continental influences on marine air masses. Radon is a naturally occurring radioactive gas emitted by soil and rocks and has a half-life of about 3.8 days. In both campaigns, a dual-flow-loop, two-filter detector 51 is used for measurements.
In-cloud measurements. The Cloud Droplet Probe (CDP, DMT, Boulder, CO) measures cloud droplet size distributions for droplets ranging from 2-50 µm in diameter. The Fast Cloud Droplet Probe (FCDP, SPEC inc. Boulder, CO) also measures cloud droplet distributions of droplets from 1.5-50 µm in diameter. Both the CDP and FCDP are sensitive to some coarse mode aerosol. The Fast Forward Scattering Spectrometer Probe (FFSSP, SPEC inc., Boulder, CO) also measures particles from 1.5 to 50 µm in diameter. The Cloud-Aerosol Spectrometer (CAS, DMT, Boulder, CO) measures particles ranging from 0.51-50 µm in diameter at ambient relative humidity. The CDP, FCDP, FFSSP, and CAS measure both liquid and ice particles and cannot distinguish between the two phases. The Cloud Imaging Probe (CIP, DMT, Boulder, CO) measures the size of particles from 25-1550 µm with 25 µm bins. The Two-Dimensional Stereo Optical Array Probe (2D-S, SPEC inc., Boulder, CO) images cloud particles to obtain droplet sizes ranging from 25-1280 µm diameter range. For both the 2D-S and the CIP, the smallest bins (<50 µm) are excluded from the aggregated datasets due to the large uncertainty associated with them. The High-Volume Precipitation Spectrometer (HVPS, SPEC inc., Boulder, CO) combines the 2D-S optoelectronics with optics and probe tips designed to minimize shattering and image particles as large as 1.92 cm with a 150 µm pixel resolution. The CIP, HVPS, and CPI can identify the particle phase. There is much uncertainty in the particle size and concentration from optical probes that are introduced in the processing of the data due to the lack of consensus on how to handle partially imaged particles, shattered particles, out-of-focus particles, and other caveats that require assumptions. In addition, smaller particles (<150 µm) are particularly uncertain in size because they cover only a small number of pixels and have a highly uncertain depth of field. The Hawkeye is a combination of four probes in one and was originally developed by SPEC Inc. to fly on the NASA Global Hawk unmanned aerial vehicle. The four probes include an FCDP, two 2D-S probes, one of which is modified to have a 50 µm resolution and size range from 50-6400 µm, and finally a Cloud Particle Imager (CPI, SPEC inc., Boulder, CO) which images particles with 2.3 µm bin resolution and has a size range of 2.3-2300 µm. Measurements from the Hawkeye FCDP are referred to HawkFCDP in Fig. 2. The King probe (DMT, Boulder, CO) measures the cloud liquid water content (LWC) from hot-wire measurements with an uncertainty of 15% 52 . The Nevzorov probe 53 has two separate sensors, one for measuring cloud LWC and one for measuring cloud total water content (TWC, i.e., LWC + ice water content) with an uncertainty of 20%. Vertical velocity is derived with differential pressure measurements, is corrected for aircraft heading and has an uncertainty of ±0.1 m s −1 .
Some probe data is excluded from this data descriptor after careful analysis. The Flight Probe Dual Range Phase Doppler Interferometer (FPDR-PDI, Artium Technologies Inc., Sunnyvale, CA) was only used during the ORACLES campaign and is a redundant measurement as it overlaps entirely with the CDP and CAS size range. For this overlapping size range, Gupta et al. 54 determined that the CAS and CDP were the better measurements for 2016 and 2017-2018, respectively; therefore, the FPDR-PDI measurements are excluded. In addition, during SOCRATES, the Precipitation Imaging Probe (PIP, DMT, Boulder, CO) and a 2-Dimensional Cloud probe (2D-C) which measure large precipitation-sized particles, were both deemed unusable due to a problem with the time record and degraded image quality, respectively 55 .

Remote measurements. The NCAR nadir/zenith-pointing High-Spectral Resolution Lidar (HSRL)
measured the vertical curtain of aerosol extinction, backscatter, and depolarization at a 0.532 µm wavelength 56 . Similarly, the NASA nadir-pointing HSRL made the same vertical curtain measurements and additionally www.nature.com/scientificdata www.nature.com/scientificdata/ measured backscatter and depolarization at 1.064 µm wavelength 57 . With these measurements, particle type 58,59 and CCN concentration can be estimated based on methods developed by Georgoulias et al. 17 , who used CALIPSO lidar measurements to estimate surface CCN concentration. Additionally, traditional remote vertically integrated aerosol optical depth (AOD) and extinction measurements are directly comparable to in-situ CCN measurements as a baseline comparison. The HSRL can be considered a useful surrogate of what could be obtainable from future satellites 36,37 . Also, the HSRL has a higher signal-to-noise ratio and more overlap with in-situ measurements than the CALIPSO polar-orbiting satellite, enabling a better statistical comparison.
Similarly, many campaigns utilized a Research Scanning Polarimeter (RSP), which can extract aerosol products by measuring the upwelling total and polarized reflectance at multiple angles and spectral bands using refractive telescopes. With these measurements, the microphysical aerosol properties from polarimetry (MAPP) algorithm derives a bi-modal particle size distribution ranging from 0.094 to 5.1 µm 35 . RSP measurements are not currently available on satellite measurements.
Particle hygroscopic growth and CCN. Two 3-wavelength integrating nephelometers (0.450, 0.550, 0.700 µm) (Model 3563, TSI, St. Paul, MN) measure the total scattering, one at dry RH (<40%) and the other at a high RH (~80%) with a 25% uncertainty. Scattering coefficients are corrected for truncation errors using Anderson and Ogren 60 . Similarly, the aerosol absorption is derived from two Radiance Research 3-wavelength (0.470, 0.532, 0.660 µm) Particle Soot Absorption Photometers (PSAP), one at dry RH (<40%) and the other at high RH (~80%) with sometimes up to 50% uncertainty in coarse aerosol (rare in the marine boundary layer relative to the continental boundary layer). Data were corrected for a variety of errors using Virkkula 61 . The Nephelometer and PSAP pairs can be used to calculate the aerosol extinction coefficient by first using the measured angstrom exponent to convert the scattering coefficient to 0.532 µm then taking the sum of the absorption and scattering. With scattering coefficients for wet and dry air, the aerosol extinction at 0.532 µm is derived for the measured ambient humidity by assuming a single-parameter monotonic growth curve 62,63 . These measurements are particularly useful for testing assumptions involving hygroscopic growth corrections for remote measurements made at ambient relative humidity. During some of the ACTIVATE flights, the drying of the dry nephelometer was suboptimal, resulting in a smaller ambient relative humidity range for particle growth correction. Only measurements from campaigns with both wet and dry scattering measurements are helpful for this analysis as both are necessary to derive the hygroscopic influence on particle scattering at ambient relative humidity. The theoretical cut size for the nephelometers and PSAPs is 3-4 µm 46,64 . The Stream-wise thermal gradient continuous-flow CCN counters (DMT, Boulder, CO) and custom-built miniature versions 65 measure CCN spectra over a scanned range of supersaturations and CCN concentrations at constant supersaturations. The constant and scan supersaturation ranges vary by campaign (Figs. 3, 4).

Data Records
An aggregated dataset, consisting of time series with all in-situ aircraft or ship campaign measurements presented in Tables 2, 3, is available through Dryad 66 . All missing or invalid data flags are converted to 'Na' . Some datasets have already been filtered for inlet shattering in-cloud and measurement contamination from ship exhausts; however, methods of filtering ship exhaust vary by campaign. For the NAAMES ship campaigns, the research ship exhaust was identified and filtered out based on the wind direction relative to the ship exhaust and total particle counts. For CAPRICORN2, wind direction, total particle counts, black carbon particle concentration, and CO and CO 2 measurements were also utilized in filtering ship exhaust 67 . Finally, the MARCUS ship exhaust contamination periods are identified and filtered using total particle counts and CO measurements 67 . The aggregated dataset is further filtered to eliminate measurements influenced by in-cloud inlet shattering and averaged at 10-second intervals for aircraft measurements and 5-minute intervals for ship measurements (except for CAPRICOR2, which is only publicly available at hourly averaged intervals). While the remote HSRL and RSP measurements are briefly discussed, they are not included in the aggregated dataset. Remote retrievals methods of aerosol, CCN, and CCN proxies are constantly being updated and improved. Therefore, the raw data may be necessary to validate methods developed in the future. The remote measurement can be found in each campaign's data archive, discussed in the last paragraph of this section. Evaluating current published remote retrieval methods is a focus of ongoing work.
In the discussed campaigns, many instruments measure particle, cloud, drizzle, and precipitation size distributions (discussed in the methods section and shown in Tables 2, 3). Almost all have different size range limitations. Here we create value-added products and size distinctions in the aggregated dataset to highlight two other variables related to CCN concentration and to create consistency between instrumentation products with varying limitations. First, we create a CCN proxy based on particle size from instruments that measure submicron particles. The proxy is calculated as the total number of particles > 0.1 µm diameter. How well this proxy represents the CCN concentration depends on the particle composition and supersaturation (Fig. 4). In the aggregated dataset, this CCN proxy is identified by variable names that start with the instrument abbreviation and end with " > 0.1" to denote that they represent only the particles greater than 0.1 µm diameter (e.g., LAS_ > 0.1). The general quantity is referred to as CN >0.1 . Note that this proxy's upper particle size limit is instrument-dependent and shown in Tables 2, 3 (typically >0.6 µm). CN >0.1 is sometimes a poor proxy, particularly in low supersaturations conditions and when the particles are hydrophobic 68 , but it takes advantage of the fact that a particle's size greatly influences its ability to thermodynamically activate and form a cloud droplet due to the Kelvin effect 21,[25][26][27] and has the advantage of often being consistently available, unlike CCN concentrations. Notably, the chemical composition significantly impacts CCN activation ability at smaller sizes and higher supersaturations 65 . Second, the in-cloud measurements of cloud droplet number concentration, total water content, and diameter, excluding drizzle and precipitation, are designated to measurements of particles that are <50 µm diameter and have a minimum size of 1.5-3 µm in diameter (instrument dependent).
www.nature.com/scientificdata www.nature.com/scientificdata/ In the literature, several thresholds are proposed to define at what size coalescence is efficient, leading to drizzle and precipitation. Such thresholds typically range from 40 to 80 µm diameter 69 . In addition, a typical upper diameter limit for instruments measuring the smallest cloud droplets is 50 µm. This has caused the sub 50 µm diameter droplet concentration to often be reported as the non-drizzling cloud-droplet number concentration. For these reasons, we integrate cloud droplet properties for droplets up to 50 µm diameter. These cloud droplet measurements are identified by variable names that end with "<50" to denote that these variables represent cloud droplet measurements of droplets less than 50 µm diameter. Measurements of droplets greater than 50 µm www.nature.com/scientificdata www.nature.com/scientificdata/ in diameter are separately defined with variable names ending with ">50". These likely represent drizzle or precipitation-sized droplets and could help identify the impact of drizzle and precipitation on remote sensing retrievals estimating cloud properties. Specifically, the CIP and 2D-S are available in the aggregated dataset as total droplet concentrations greater than 50 µm diameter. Fig. 4 Comparisons of in-situ CCN measurements at various supersaturation levels and total CN greater than 100 nm in diameter for both aircraft and ship-based campaigns. CN greater than 100 nm in diameter is derived from several instrumentation, depending on what is available in each campaign. All size distributions measurements are of dried particles except for the PCASP measurements shown for ORACLES1 because no dry particle distribution was available. Aircraft and ship measurements shown are 10-second and 5-minute averages, respectively, except for CAPRICORN2, which are hourly averages.
www.nature.com/scientificdata www.nature.com/scientificdata/ If full-size distributions or different thresholds instead of the integrated value-added products, the data can be obtained from permanent archives. Measurements are retrieved from several data repositories to create the aggregated dataset. The original data for the NASA-led NAAMES 70

Technical Validation
Cloud droplet measurements. The in-situ CDNC measurements from the campaigns have redundant instrumentation for validation. Figure 2 shows CDNC comparisons from two separate instruments for all aircraft-based campaigns except for SOCRATES. SOCRATES is the only campaign without a redundant CDNC measurement; however, a King-probe LWC measurement is available and compared to the integrated CDP CDNC volume (r = 0.90). Before comparing measurements from the two instruments, LWC measurements were used to determine if the measurements were in or out of the cloud. All instruments that measured LWC were time synced by maximizing the cross-correlation between measurements and visually verified. Following the time sync, measurements were determined to be in-cloud when any instrument's cloud LWC was greater than 0.02 g m −3 . Then a 10-second buffer is applied to remove cases at cloud edges (i.e., measurements within 10 seconds of exiting or entering a cloud are excluded). Finally, a 10-second average is applied to the CDNC measurements to produce Fig. 2.
The CDNC comparisons generally show good agreements (r ≥ 0.70). The worst correlations were from the NAAMES airborne campaigns in comparisons with a CDP and a CAS because there were often problems with the CAS significantly undercounting the CDNC. For this reason, it is recommended that the CDP measurements are prioritized over the CAS for CDNC for the NAAMES dataset. Excluding the CDNC comparisons from NAAMES, the correlations are strong (r ≥ 0.89). It is worth noting the FCDP may provide more accurate measurements than the CDP, CAS, or FFSSP in conditions where CDNC is high (>200 cm −3 ) due to the higher possibility of coincidence in the CDP, which leads to undercounting and over-sizing of cloud droplets 86-88 . aerosol and CCN measurements. The in-situ CCN concentration is compared to total CN (Fig. 3) and CN greater than 0.1 µm (CN >0.1 ) (Fig. 4) for all aircraft and ship-based campaigns. For aircraft-based subsaturated measurements, TWC is used to exclude in-cloud measurements. For out-of-cloud measurements, the TWC is required to equal 0 g m −3 , a 10-second buffer is applied to remove measurements near the cloud that may be influenced by inlet shattering (i.e., measurements within 10 seconds of exiting or entering a cloud are excluded), and a 10-second average is applied. Ship-based measurements are averaged at 5-minute intervals, except for the CAPRICORN2 dataset, which is provided at hourly averaged intervals.
As expected, the comparison in Fig. 3 shows CCN concentration is less than or equal to the CN concentration (within the measurement error), with measurements at higher supersaturations matching more closely with the CN concentration than at lower supersaturations. As discussed in the introduction, remote detection methods often approximate CCN concentration by differentiating particles by size. Therefore, CN >0.1 (a value-added product defined in the Data Records section as particles greater than 0.1 µm diameter) acts as a relatively good proxy for CCN concentrations 21 . Comparing CCN and CN >0.1 is not a one-to-one comparison because particles greater than and less than 0.1 µm diameter both may or may not be CCN active depending on the level of supersaturation and particle composition. However, particles above 0.1 µm diameter are more likely to act as CCN than smaller ones, causing some correlation between the two variables, and are a necessary approximation in the absence of CCN measurements.

Usage Notes
The measurements shown in this data descriptor and other in-situ measurements in Tables 2, 3 are processed from the publicly available data repositories using a customized python code. The python code is available for transparency; however, due to the large number of data files needed for input and the need to account for variable file formats, the output files with the time-synced datasets are available in '.csv' format.
This aggregated dataset is compiled with the purpose of validating remote retrievals of CDNC, CCN, and CCN proxies from satellite, HSRL, and RSP data available at the time of the campaigns. The goal is to improve satellite retrievals and model representation of aerosol and cloud properties. Other possible uses of this dataset include: • Statistical studies quantifying the influence of regional pollutant perturbations and environmental differences on marine particle and cloud properties. Particularly precipitation susceptibility and cloud lifetime. • Evaluate model simulated aerosol and cloud properties. Specifically, the model representation of particle source, transport, composition, and CCN activity when forming clouds. (2023) 10:471 | https://doi.org/10.1038/s41597-023-02372-z www.nature.com/scientificdata www.nature.com/scientificdata/ • Aerosol and cloud microphysical studies with CCN and CDNC closure studies to identify the sensitivity of cloud microphysical and radiative properties to the aerosol and boundary layer dynamics. • Validating CDNC, CCN, and CCN proxy detection methods via remote Satellite, HSRL, and RSP methods that are developed in the future, assuming the raw measurements can be reprocessed based on the new methods.

Code availability
The python code is available for transparency and use through Dryad (Data Citation 1 66 ); however, due to the large number of data files needed for input and the need to account for variable file formats there are many custom aspects to the code that are included for specific datasets making the code complicated. The coding environment dependencies and versions are included in a .yml file.